Goto

Collaborating Authors

 mpc protocol


End to End Collaborative Synthetic Data Generation

Pentyala, Sikha, Sitaraman, Geetha, Claar, Trae, De Cock, Martine

arXiv.org Artificial Intelligence

The success of AI is based on the availability of data to train models. While in some cases a single data custodian may have sufficient data to enable AI, often multiple custodians need to collaborate to reach a cumulative size required for meaningful AI research. The latter is, for example, often the case for rare diseases, with each clinical site having data for only a small number of patients. Recent algorithms for federated synthetic data generation are an important step towards collaborative, privacy-preserving data sharing. Existing techniques, however, focus exclusively on synthesizer training, assuming that the training data is already preprocessed and that the desired synthetic data can be delivered in one shot, without any hyperparameter tuning. In this paper, we propose an end-to-end collaborative framework for publishing of synthetic data that accounts for privacy-preserving preprocessing as well as evaluation. We instantiate this framework with Secure Multiparty Computation (MPC) protocols and evaluate it in a use case for privacy-preserving publishing of synthetic genomic data for leukemia.


Transformers, parallel computation, and logarithmic depth

Sanford, Clayton, Hsu, Daniel, Telgarsky, Matus

arXiv.org Artificial Intelligence

The transformer (Vaswani et al., 2017) has emerged as the dominant neural architecture for many sequential modeling tasks such as machine translation (Radford et al., 2019) and protein folding (Jumper et al., 2021). Reasons for the success of transformers include suitability to modern hardware and training stability: unlike in recurrent models, inference and training can be efficiently parallelized, and training is less vulnerable to vanishing and exploding gradients. However, the advantages of transformers over other neural architectures can be understood more fundamentally via the lens of representation, which regards neural nets as parameterized functions and asks what they can efficiently compute. Many previous theoretical studies of transformers establish (approximation-theoretic and computational) universality properties, but only at large model sizes (Yun et al., 2020; Pérez et al., 2021). These results are not unique to transformers and reveal little about which tasks can be solved in a size-efficient manner.


Fast and Private Inference of Deep Neural Networks by Co-designing Activation Functions

Diaa, Abdulrahman, Fenaux, Lucas, Humphries, Thomas, Dietz, Marian, Ebrahimianghazani, Faezeh, Kacsmar, Bailey, Li, Xinda, Lukas, Nils, Mahdavi, Rasoul Akhavan, Oya, Simon, Amjadian, Ehsan, Kerschbaum, Florian

arXiv.org Artificial Intelligence

Machine Learning as a Service (MLaaS) is an increasingly popular design where a company with abundant computing resources trains a deep neural network and offers query access for tasks like image classification. The challenge with this design is that MLaaS requires the client to reveal their potentially sensitive queries to the company hosting the model. Multi-party computation (MPC) protects the client's data by allowing encrypted inferences. However, current approaches suffer prohibitively large inference times. The inference time bottleneck in MPC is the evaluation of non-linear layers such as ReLU activation functions. Motivated by the success of previous work co-designing machine learning and MPC aspects, we develop an activation function co-design. We replace all ReLUs with a polynomial approximation and evaluate them with single-round MPC protocols, which give state-of-the-art inference times in wide-area networks. Furthermore, to address the accuracy issues previously encountered with polynomial activations, we propose a novel training algorithm that gives accuracy competitive with plaintext models. Our evaluation shows between $4$ and $90\times$ speedups in inference time on large models with up to $23$ million parameters while maintaining competitive inference accuracy.


Secure Multiparty Computation for Synthetic Data Generation from Distributed Data

Pereira, Mayana, Pentyala, Sikha, Nascimento, Anderson, Sousa, Rafael T. de Jr., De Cock, Martine

arXiv.org Artificial Intelligence

Legal and ethical restrictions on accessing relevant data inhibit data science research in critical domains such as health, finance, and education. Synthetic data generation algorithms with privacy guarantees are emerging as a paradigm to break this data logjam. Existing approaches, however, assume that the data holders supply their raw data to a trusted curator, who uses it as fuel for synthetic data generation. This severely limits the applicability, as much of the valuable data in the world is locked up in silos, controlled by entities who cannot show their data to each other or a central aggregator without raising privacy concerns. To overcome this roadblock, we propose the first solution in which data holders only share encrypted data for differentially private synthetic data generation. Data holders send shares to servers who perform Secure Multiparty Computation (MPC) computations while the original data stays encrypted. We instantiate this idea in an MPC protocol for the Multiplicative Weights with Exponential Mechanism (MWEM) algorithm to generate synthetic data based on real data originating from many data holders without reliance on a single point of failure.


MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Wang, Yongqin, Rajat, Rachit, Annavaram, Murali

arXiv.org Artificial Intelligence

Multi-party computing (MPC) has been gaining popularity over the past years as a secure computing model, particularly for machine learning (ML) inference. Compared with its competitors, MPC has fewer overheads than homomorphic encryption (HE) and has a more robust threat model than hardware-based trusted execution environments (TEE) such as Intel SGX. Despite its apparent advantages, MPC protocols still pay substantial performance penalties compared to plaintext when applied to ML algorithms. The overhead is due to added computation and communication costs. For multiplications that are ubiquitous in ML algorithms, MPC protocols add 32x more computational costs and 1 round of broadcasting among MPC servers. Moreover, ML computations that have trivial costs in plaintext, such as Softmax, ReLU, and other non-linear operations become very expensive due to added communication. Those added overheads make MPC less palatable to deploy in real-time ML inference frameworks, such as speech translation. In this work, we present MPC-Pipe, an MPC pipeline inference technique that uses two ML-specific approaches. 1) inter-linear-layer pipeline and 2) inner layer pipeline. Those two techniques shorten the total inference runtime for machine learning models. Our experiments have shown to reduce ML inference latency by up to 12.6% when model weights are private and 14.48\% when model weights are public, compared to current MPC protocol implementations.


Training Differentially Private Models with Secure Multiparty Computation

Pentyala, Sikha, Railsback, Davis, Maia, Ricardo, Dowsley, Rafael, Melanson, David, Nascimento, Anderson, De Cock, Martine

arXiv.org Artificial Intelligence

We address the problem of learning a machine learning model from training data that originates at multiple data owners, while providing formal privacy guarantees regarding the protection of each owner's data. Existing solutions based on Differential Privacy (DP) achieve this at the cost of a drop in accuracy. Solutions based on Secure Multiparty Computation (MPC) do not incur such accuracy loss but leak information when the trained model is made publicly available. We propose an MPC solution for training DP models. Our solution relies on an MPC protocol for model training, and an MPC protocol for perturbing the trained model coefficients with Laplace noise in a privacy-preserving manner. The resulting MPC+DP approach achieves higher accuracy than a pure DP approach, while providing the same formal privacy guarantees. Our work obtained first place in the iDASH2021 Track III competition on confidential computing for secure genome analysis.


PrivFairFL: Privacy-Preserving Group Fairness in Federated Learning

Pentyala, Sikha, Neophytou, Nicola, Nascimento, Anderson, De Cock, Martine, Farnadi, Golnoosh

arXiv.org Artificial Intelligence

Group fairness ensures that the outcome of machine learning (ML) based decision making systems are not biased towards a certain group of people defined by a sensitive attribute such as gender or ethnicity. Achieving group fairness in Federated Learning (FL) is challenging because mitigating bias inherently requires using the sensitive attribute values of all clients, while FL is aimed precisely at protecting privacy by not giving access to the clients' data. As we show in this paper, this conflict between fairness and privacy in FL can be resolved by combining FL with Secure Multiparty Computation (MPC) and Differential Privacy (DP). In doing so, we propose a method for training group-fair ML models in cross-device FL under complete and formal privacy guarantees, without requiring the clients to disclose their sensitive attribute values. Empirical evaluations on real world datasets demonstrate the effectiveness of our solution to train fair and accurate ML models in federated cross-device setups with privacy guarantees to the users.


Multi-Party Computation on Machine Learning - Security Boulevard

#artificialintelligence

During my internship this summer, I built a multi-party computation (MPC) tool that implements a 3-party computation protocol for perceptron and support vector machine (SVM) algorithms. MPC enables multiple parties to perform analyses on private datasets without sharing them with each other. I developed a technique that lets three parties obtain the results of machine learning across non-public datasets. It is now possible to perform data analytics on private datasets that was previously impossible due to data privacy constraints. For MPC protocols, a group of parties, each with their own set of secret data, xi, share an input function, f, and each is able to obtain the output of f(x1,…,xn) without learning the private data of other parties.